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ABSTRACT 

This  paper  contains  two  new  results  pertaining  to  the 
initial  transient  problem  for  steady-state  simulations. 
Our  first  result  rigorously  establishes  the  asymptotic 
superiority  of  a  few  long  replications  relative  to  a 
large  number  of  shorter  replications,  assuming  that 
no  initial  transient  deletion  is  attempted.  Our  sec¬ 
ond  result  concerns  an  initial  transient  detection  test 
proposed  by  Schruben;  we  develop  asymptotics  that 
are  suggestive  of  the  types  of  initial  transients  that 
the  test  is  capable  of  detecting.  As  one  might  ex¬ 
pect,  the  ability  to  detect  a  non-stationarity  in  the 
simulation  output  depends  both  on  the  magnitude  of 
the  non-stationarity  of  the  initial  condition,  and  the 
degree  of  autocorrelation  in  the  process. 

Keywords:  initial  transient,  steady-state  simula¬ 
tion,  Markov  chains. 

1  INTRODUCTION 

Let  X  =  {X{t)  :  t  >  0)  be  a  real- valued  stochastic 
process  representing  the  output  of  a  discrete-event 
simulation.  We  say  that  X  possesses  a  steady-state 
if  there  exists  a  (deterministic)  constant  a  such  that 

X{t)  =  \  f  X{s)ds^  a  (1) 

i  Jo 

as  t  — )•  oo,  where  denotes  convergence  in  distri¬ 
bution,  and  =  a  definition.  Assuming  (1)  holds,  the 
parameter  a  is  referred  to  as  the  steady-state  mean  of 
X ;  the  steady-state  simulation  problem  is  concerned 
with  the  use  of  simulation  to  numerically  compute  a. 

Of  course,  typically  the  distribution  under  which  X 
is  initiated  is  atypical  of  the  steady-state  of  X,  and 
consequently  the  initial  portion  of  the  simulation  is 
not  representative  of  the  steady-state.  As  a  result, 
the  initial  portion  of  the  simulation  contains  output 
that  is  biased  relative  to  the  steady-state.  The  initial 
transient  problem  is  concerned  with  the  study  of  this 


bias,  and  the  development  of  associated  procedures 
that  can  be  used  to  mitigate  its  impact. 

Note  that  the  law  of  large  numbers  (1)  suggests  us¬ 
ing  X{t)  as  an  estimator  for  a.  To  assess  the  rate 
of  convergence  of  X{t)  to  a,  it  is  common  practice 
to  construct  a  confidence  interval  for  a  based  on  the 
estimator  X{t).  In  great  generality,  it  is  known  that 
the  law  of  large  numbers  (1)  is  typically  accompa¬ 
nied  by  a  central  limit  theorem  (CLT);  see,  for  ex¬ 
ample,  Glynn  and  Iglehart  (1990)  for  a  discussion  of 
this  phenomenon.  The  CLT  asserts  the  existence  of 
a  (deterministic)  constant  a  such  that 

-  a)  aNiO,  1)  (2) 

as  ^  ►  oo,  where  ^"(0,1)  denotes  a  normal  r.v. 

with  zero  mean  and  unit  variance;  the  constant 
is  known  as  the  time- average  variance  of  X.  In  order 
to  construct  confidence  intervals  for  a  based  on  (2), 
it  is  necessary  to  either  estimate  directly^  or  to 
somehow  develop  a  limit  theorem  involving  X{t)  in 
which  is  “cancelled  out” .  Direct  estimation  of 
is  a  special  challenge  in  this  estimation  setting,  since 
the  estimator  X{t)  is  effectively  constructed  from 
just  one  replication  of  X,  and  consequently  sophisti¬ 
cated  methods  need  to  be  applied  in  order  to  estimate 
Such  methods  include  the  regenerative  method, 
spectral  analysis  techniques,  and  auto-regressive  ap¬ 
proaches;  see  Bratley,  Fox,  and  Schrage  (1987)  for 
details. 

To  avoid  this  variance  estimation  difficulty,  one  can 
instead  replicate  X  m  independent  times,  running 
each  replication  to  time  t/ra  so  that  the  total  time 
horizon  simulated  is  identical  to  that  needed  for  X{t). 
Letting  Xi,  X2, . . .  be  the  associated  replications  of 
X,  this  leads  to  the  multiple  replications  estimator 

1 

am{t)  =  —  ^Xi{t/m), 

where  Xi{t)  =  t~^  Jg  ds.  Given  that  m  >  2,  an 


obvious  estimator  for  the  variance  of  am{t)  is  then 


and 


^  TU 

=  m(m-l) 

Of  course,  one  difficulty  with  the  multiple  replications 
approach  just  described  is  that  the  initial  transient 
phase  is  itself  replicated  in  each  of  the  m  replications, 
thereby  magnifying  the  impact  of  the  initial  transient. 
This  suggests  that  a  few  long  replications  ought  to 
be  superior  to  many  shorter  replications.  In  section 
2,  this  result  is  rigorously  established  by  using  the 
criterion  of  mean  square  error  (MSE)  as  a  basis  for 
comparison. 

In  section  3,  we  consider  an  initial  bias  detection 
test  proposed  by  Schruben  (1982).  We  use  a  class 
of  autoregressive  processes  to  study  the  sensitivity  of 
the  test  to  the  initial  condition  of  the  simulation,  and 
autocorrelation  structure  of  the  process. 

2  THE  SUPERIORITY  OF  A  FEW  LONG 
RUNS 

In  this  section,  we  shall  analyze  the  asymptotic  MSE 
of  ay„(^)  as  i  oo.  In  Glynn  (1987),  the  multiple 
replicates  estimator  am{t)  was  studied  using  meth¬ 
ods  of  weak  convergence;  however,  most  of  the  theory 
presented  there  assumes  that  m  oo  os  t  oo.  In 
our  current  discussion,  however,  we  are  especially  in¬ 
terested  in  the  case  of  small  to  moderate  values  of  m, 
and  our  tools  are  consequently  somewhat  different. 
The  basic  assumption  underlying  the  analysis  of 
this  section  is: 

A1  There  exist  constants  h  and  c,  and  a  positive  con¬ 
stant  7  such  that 

EX{t)  =  a+j  +  0[e-'^^)  (3) 

and 

var  X(<)  =  Y  +  ^  +  0(e-T'‘)  (4) 

as  ^  (X). 

It  is  shown  in  Glynn  (1984)  that  such  asymptotic 
expansions  hold,  for  example,  when  A  is  a  finite- 
state,  irreducible,  continuous-time  Markov  chain.  We 
shall  provide  an  alternative  justification  for  this  as¬ 
sumption  later  in  this  section. 

With  (3)  and  (4)  at  our  disposal,  observe  that 

Eam{t)  =  EX{t/m) 

=  a  +  ^  +  OCe-T'*/”') 


var  am{t)  ==  —  var  Xii/m) 

m 

=  Y  +  ^  +  0(e-‘/-) 

as  t  ^  oo.  Consequently,  the  MSE  of  a^(t)  as  an 
estimator  of  a  can  be  expressed  as 

E{am{t)  -  =  var  a^it)  +  (Eamit)  -  af 

=  y  +  +  + 

We  can  summarize  the  above  discussion  with  the 
following  proposition. 


Proposition  1  Under  Al, 

E{a^{t)  -  a)2  ^  ^ 

as  t  oo. 


Note  that  so  long  as  c  is  non-negative,  it  is  clear 
that- setting  m—1  minimizes  MSE.  However,  even  if 
c  is  negative,  it  is  never  optimal,  from  the  standpoint 
of  asymptotic  MSE,  to  choose  m  strictly  greater  than 
max(l,  “c/6^).  Since  c  and  b  are  typically  unknown,  a 
conservative  approach  to  choosing  m  sets  m  =  1;  the 
idea  is  that  if  we  choose  m  too  large,  the  penalty  for 
doing  so  can  be  arbitrarily  bad,  whereas  the  downside 
risk  associated  with  choosing  m  =  1  is  an  inflation  in 
the  MSE  that  is  of  asymptotic  order  at  most  -b 
46^cH-c^)/(46^^^). 

We  turn  now  to  justifying  Al.  Many  different  ap¬ 
proaches  are  available,  but  we  shall  focus  on  providing 
a  coupling  argument  for  Al;  see  Lindvall  (1992)  to  get 
a  sense  of  the  scope  and  power  of  this  method.  In  any 
case,  we  shall  use  coupling  to  prove  Al  for  a  discrete¬ 
time  simulation  output,  in  which  X  =  {X^  :  n>0) 
takes  the  form  Xn  =  f{Zn),  where  Z  =  {Zn  :n>0) 
is  a  Markov  chain  taking  values  in  some  state  space 
5,  and  /  :  S  — >  R  is  a  bounded  function.  Let 
P”^(a:,  ’)  =  P{Zm  €  '\Zo  ~  x)  he  the  m-step  tran¬ 
sition  kernel  of  Z.  We  shall  require  that  there  exist  a 
constant  A  >  0,  m  >  1,  and  a  probability  distribution 
ip  on  S  such  that 

(5) 

for  all  X  e  5;  condition  (5)  is  closely  related  to  the 
concept  of  Doeblin  recurrence.  (Note  that  any  finite- 
state,  aperiodic,  irreducible  Markov  chain  automati¬ 
cally  satisfies  (5).)  It  is  well  known  that  (5)  implies 
that  Z  possesses  a  unique  (steady-state)  stationary 
distribution  tt;  see  Meyn  and  Tweedie  (1993)  for  de¬ 
tails. 


For  a  given  function  p  :  5  — +  IR,  let 


where  fc{x)  =  f{x)  -  Ef{Zo).  Hence, 


||5||  =  sup{\g{x)\  -.xeS}. 

Also,  let  Z*  be  a  stationary  version  of  Z  (ie.  a  Markov 
chain  on  5  with  the  same  transition  law  as  Z,  but  ini¬ 
tiated  with  the  stationary  distribution  tt).  Coupling 
plays  a  key  role  in  establishing  the  following  inequal¬ 
ity. 

Proposition  2  Suppose  (5)  is  satisfied.  Then  there 
exists  p  with  |p|  <  1  such  that 

\Ef{Zi)g{Zi+:i)-Ef{Z*o)g{Z*)\  < 

Proof:  Let  Zo  and  Zq  be  chosen  independently  ac¬ 
cording  to  their  respective  initial  distributions  and  set 
Q[x,  •)  =  {P^{x,  •)  -  Av?(-))/(l  -  A).  With  probabil¬ 
ity  A,  we  can  force  Zm  =  by  requiring  that  both 
chains  distribute  themselves  according  to  (p\  with 
probability  1  —  A,  we  allow  Zm  and  to  distribute 
themselves  independently  according  to  (5(Zo,  •)  and 
(5(Zo,')  respectively.  The  intermediate  values  (Zi, 
. . . ,  Zm^i)  and  (Zj , . . . ,  Z^_f)  are  then  generated  in¬ 
dependently  according  to  their  respective  conditional 
distributions  (conditional  on  the  initial  state,  and  the 
state  at  time  m).  If  Zm  =  Z^^  we  then  let  the 
two  chains  run  together  henceforth,  setting  Zn  =  Z* 
for  n  >  m.  Otherwise,  we  repeat,  conditional  on 
(Zm,  Z^),  the  construction  above,  so  that  with  prob¬ 
ability  A,  Z2m  and  ZJm  agree.  Again,  if  we  get  agree¬ 
ment,  we  set  Zn  =  Z*  for  n  >  2m.  This  process  gets 
repeated  at  multiples  of  time  m  until  we  get  an  agree¬ 
ment  in  Z  and  Z*,  after  which  we  let  the  two  chains 
run  together.  This  construction  establishes  the  ex¬ 
istence  of  a  coupling  time  T  such  that  Zn  =  Z*  for 
n  >  r,  and  P{T  >  n)  <  (1  -  . 

Clearly 

\Ef{Zi)g{Zi+i)-Ef{Z*o)g{Z;)\ 

=  \EnZi)g{Zi^j)  -  ES{Z;)g{Z:^^)\ 

=  \E[f{Z,)g{Zi+i)  -  f{Z:)g{ZUjy,T>  i]| 

<  2||/|||yP(T>i) 

<  2||/||||<7||{1-A)L‘/H, 

proving  the  result. 

□ 

Turning  now  to  (3),  proposition  2  establishes  that 
\EnZi)-Ef{Z*o)\  =  Oip^). 
Consequently, 

f^\EUZi)\<<x> 

i=0 


E-^Xj  =  Ef{ZS)  +  -'^EfciZj)  (6) 

=  Ef{Zl)  +  -^EUZi) 

^  j=0 

1  °° 

^  Ef{Z^)  +  ^+Oipn, 

where  b  =  o  ^fc{Zj)-  To  prove  (4)  requires  more 
work.  For  x  e  S,  let  Ex{-)  denote  the  expectation 
operator  conditional  on  Zo  =  x,  and  set 

C50 

h{x)  =  ^ExMZk). 
k^l 

Proposition  2  guarantees  that  is  a  bounded  function 
(since  we  are  assuming  /  is).  Observe  that 

EC£fc(Zi)f  =  (7) 


+  2^^EMZi)UZi+j) 


i=0  i-1 


=  Y^EfliZi) 

i=0 
n— 1 

■  +  2Y;^EuZi)\h{Zi)-h{Zr:)]. 

i=0 

As  in  the  derivation  of  (6), 

Y,  EfliZi)  +  2Y  EUZi)h{Zi)  (8) 

t=0  i^o 

=  (n  +  l)Ef^{Z^)  +  2nEfc{Z^)h{Z^) 

+  Av  +  0(p") 

where 

«  =  YiEfUZi)  -  EfUZ*o)} 

i=0 

+  2Y{EhiZi)h{Zi)  -  EUZ*o)hiZ*o)}. 


Furthermore, 


Ln/2J 

Y  EUZi)h{Zn) 

i=0 


(9) 


[n/2\ 

<  Y.  \Efc{Zi)P^^-^h[Zi)\ 

i~0 

Ln/2J 

<  Y  ^\fc{Zi)\  sup  \EMZk)\ 

i=0  fc>n/2-l 

xes 

<  n||/||0(p"/2) 

=  0(p"/^) 

and 

n— 1 

Efc{Zi)h{Zn)  (10) 

i=  rn/21 

n 

=  Y  EUZ*oMZ:_,)  +  0{p^/^) 

t=  rn/21 

-  EUZ*o)k{Z*o)  +  0{p^/^), 

where 

oo 

j=0 

Combining  (7)  -  (10)  yields 


y^/c(-^i)  j  —  (n  +  l)a^  +  d -f  0(/9’^/^), 

/ 

where  =  £;/2(Z^)  -h  2Efc{Z^)h{Z^),  and  d  =  k- 
2EMZ^)h{ZS)  -  2Efc{Z^)k{Z^).  We  have  therefore 
established  the  following  result. 


as  n  oo  (  =>  here  denotes  weak  convergence  of 
stochastic  processes),  where  B  ™  {B{t)  :  t  >  0) 
is  a  standard  Brownian  motion.  If  the  initial  tran¬ 
sient  is  small,  the  idea  is  that  the  output  series  will 
then  indeed  be  well  approximated  by  the  Brownian 
motion  B  as  implied  by  (11),  so  that  the  test  statis¬ 
tic  constructed  from  X  will  have  approximately  the 
same  distribution  as  that  derived  from  S.  On  the 
other  hand,  if  a  significant  initial  transient  is  present, 
this  will  (hopefully)  reflect  itself  in  the  test  statis¬ 
tic  taking  on  a  value  that  is  not  representative  of 
what  is  expected  under  B.  Such  unusual  values  of 
the  test  statistic  then  lead  to  a  rejection  of  the  hy¬ 
pothesis  that  the  simulation  has  no  initial  transient. 
Schruben  (1982)  proposes  a  test  statistic  based  on 
both  the  magnitude  and  location  of  the  maximizer  of 
a  “standardized”  version  of  the  output  series  X. 

In  this  section,  we  will  study  the  question  of  how 
large  an  initial  transient  must  be  present  in  order  that 
test  statistics  derived  from  the  output  series  will  have 
corresponding  distributions  that  are  not  well  approx¬ 
imated  by  Brownian  motion.  This  essentially  comes 
down  to  the  question  of  how  large  an  initial  transient 
must  be  present,  in  order  that  the  functional  CLT 
(11)  break  down. 

It  is  difficult  to  analyze  this  question  in  full  gen¬ 
erality.  Instead,  our  analysis  focusses  on  a  specific 
class  of  output  processes  that  has  qualitative  struc¬ 
ture  representative  of  a  large  class  of  discrete- event 
simulations.  Specifically,  we  shall  be  concerned  with 
a  discrete- time  output  series  {Xn  '  n  >  0),  in  which 
the  XnS  evolve  according  to  an  autoregressive  rela¬ 
tion  of  the  form 


Theorem  1  If  (5)  holds,  and  f  is  hounded,  then  Al 
is  satisfied. 

The  argument  given  above  can  be  generalized  sub¬ 
stantially.  As  a  consequence,  we  believe  that  the 
asymptotic  result  contained  in  proposition  1  holds  in 
great  generality  in  the  discrete-event  simulation  set¬ 
ting. 

3  THE  SENSITIVITY  OF  AN  INITIAL 
TRANSIENT  DETECTION  TEST 

In  this  section,  we  study  a  test  proposed  by  Schru¬ 
ben  (1982)  for  detecting  the  presence  of  an  initial 
transient  in  a  steady-state  simulation.  It  takes  ad¬ 
vantage  of  the  fact  that  the  CLT  (2)  can  typically  be 
strengthened  to  a  functional  CLT.  Such  a  functional 
CLT  demands  that 

^1/2  \ 


+  In+l  (12) 

where  V  =  (Vn  :  n  >  0)  is  the  associated  innova¬ 
tions  sequence  and  p  >  0.  Because  we  shall  later  be 
interested  in  studying  the  effect  of  the  autoregressive 
parameter  p,  we  shall  embed  (12)  in  a  family  of  au¬ 
toregressions: 


Xn-j~l{Tn)  —  pfnXfi^Ttl)  + 

The  simple  dynamics  of  this  system  ensures  that  we 
can  write  Xn(m)  in  terms  of  the  innovation  sequence, 
namely 


Xn{m)  =  p^Mm)  + 

Consequently,  assuming  \pm\  <  1, 


.  LntJ 

-ZXiM 

j^O 


1 

n  1- Pm 


) 


■Xo(m)  (13) 


[ntj  lnt\-j 

+  E 

”  j=l  k=0 

In  order  to  proceed  further,  we  need  to  impose 
some  structure  on  the  VnS.  Specifically,  we  will  as¬ 
sume  that  the  V^s  are  i.i.d.  r.v.’s  for  which  EVn  —  7, 
var  Vn  =  7],  and  E\Vn\^^^  <  00  for  some  e  >  0.  Then, 
we  can  assume  existence  of  a  standard  Brownian  mo¬ 
tion  B  =  ( B{t)  :t  >0)  such  that 

n 

==n7-l-?7S(n)H-o(n^/^)  a.s.  (14) 

fc=i 


as  n  — ^  00;  (14)  is  known,  in  the  literature,  as  a 
“strong  approximation”  for  the  V^s.  (See  Csorgo  and 
R^vesz  (1981)  and  Philipp  and  Stout  (1975)  for  ad¬ 
ditional  details.) 

Substituting  (14)  into  (13)  and  simplifying,  we  get 

„l/2  /i  \ 

—  (15) 


as  n  — ►  00,  where  a(m)  =  7(1  —  Pm)'~^  and  <7{m)  — 
^(1  Pm)“^*  Noting  that  B{n’)/y/n  has  the  same 

distribution  as  does  B(-),  we  see  that  the  functional 
CLT  (11)  is  valid  so  long  as  the  other  terms  on  the 
right-hand  side  of  (15)  converge  to  zero  as  n  00. 
Given  that  the  test  of  Schruben  (1982)  uses  only  val¬ 
ues  of  t  G  [0, 1],  we  note  then  that 

rjy/m 


uniformly  in  f  €  [0, 1],  provided  Xo(m)  —  o{y/m).  As 
for  the  other  major  term  on  the  right-hand  side  of 
(15),  we  note  that 
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=  (1  -/0m)“7  +  0((l  -Pm)"‘)  a.s., 


where  =  denotes  equality  in  distribution.  Conse¬ 
quently,  in  order  that 


[mtj 

V 


m 


converge  in  probability  to  zero  uniformly  in  f  €  [0, 1], 
it  must  be  that  m“^/^(l  -  Pm)“^  0  as  m  --4  00, 

Hence,  our  analysis  of  the  autoregressive  model  (12) 
leads  to  the  following  conclusion. 

If  the  time  horizon  m  is  large,  we  should  not 
expect  any  test  based  on  (11)  to  detect  the 
presence  of  an  initial  transient  unless  either: 

1.  the  initial  condition  Xq  of  the  autore¬ 
gression  (12)  is  of  order  y/m  or  larger, 
or 

2.  the  auto-correlation  parameter  is  such 
that  p  is  of  the  order  1  —  cl^Jm  or 
larger. 


Thus,  this  suggests  that  this  class  of  tests  can  be 
effective  for  dealing  with  initial  transients  if  either  the 
autocorrelations  in  the  simulation  decay  slowly,  or  if 
the  initial  condition  for  the  simulation  is  significantly 
non-  stationary;  otherwise,  the  tests  should  not  be 
expected  to  detect  transient  effects. 
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